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Abstract 

In the Manchester Pharmacy School, we first adopted summative on-line examinations in 2005. 

Since then, we have increased the range of question types to include short answers, short essays 
and questions incorporating chemical structures and we achieve time savings of up to 90% in the 
marking process. Online assessments allow two novel forms of feedback. An anonymised 
spreadsheet containing all the marked exam scripts is made available to all students. This enables 
students to see a variety of answers than are awarded good marks, rather than a single model 
answer. Secondly, "Smallvoice" a novel app provides confidential personalised feedback. Feedback 
statements, though written by the instructor, are selected by a computer in response to various 
aspects of a student's performance. Evidence of improved student satisfaction comes from the unit 
questionnaires and from the National Student Survey. Evidence of improved learning comes from 
comparing pre- and post-feedback assessments (typically course tests and end of unit 
examinations.). 
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Introduction - perceptions of online assessment 

Around examination time, when backbreaking piles of answer booklets are carried from one place to 
another, carefully counted to ensure that the only copy of any student's script does not go missing, 
and finally pored over in the hope of understanding what was meant by a particular squiggle, online 
assessment seems very attractive. Online assessments are easily stored and legible, answers by 
different students can be readily compared, and files can be organised in ways that facilitate 
feedback. 

Yet decades after the introduction of online assessment, it is still practised by a small minority of 
higher education institutions (Bull and McKenna 2004). Both the published literature and our own 
experience suggest that there is widespread doubt that online assessment is secure, flexible or 
reliable. 

This paper describes our experience in online assessment over 15 years. We have sought to 
maximise the security, flexibility and reliability of our assessments, especially our summative 
assessments. Staff have benefitted from the speed and accuracy of online assessment but students 
have benefitted crucially from a range of novel feedback available as a result of online assessment. 

Security, Flexibility and Reliability: The Manchester Experience 1998-2012 

In 1997 we began a project entitled "What makes a student succeed?" (Sharif et. al. 2003; 2007a and 
b), requiring the use of several assessments in the first week of the MPharm course. The results 
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were used to assign students to foundation groups, and the assessments were therefore high stakes, 
though not summative. The start of this project coincided with the University's introduction of so- 
called CBA (computer based assessment) software, which was a modification of the commercially 
available Questionmark software. 

Students sat the password-protected tests in University computer clusters. Passwords were issued 
immediately prior to the tests, which were invigilated and conducted under standard examination 
conditions (no books, paper, coats etc permitted). There is no evidence of any security breach at 
any point. 

Question types were limited to automatically-marked multiple choice, text match and numerical 
questions. While this was adequate for the purpose of assigning students to foundation groups, it 
would not permit a full range of assessments in a Pharmacy programme to be conducted online. 

During this period, all our worst fears about computer reliability were realised. In the period 1998- 
2004, the testing proceeded smoothly only in 2002. In every other year there was a failure of some 
sort. These ranged through network failures due to excessive traffic, a virus affecting the whole 
university system, human error during system maintenance, and a leak in the roof above the 
computer cluster. 

In 2005, we began to explore the use of WebCT (later to be superseded by Blackboard 8 and 
Blackboard 9) for summative assessment. This was prompted by an increase in student numbers in a 
first year Cell Biology and Biochemistry class to over 200. The teaching time on the unit was 48 
hours, but the paper-based assessment was taking 60 hours to mark. The paper-based assessment 
was replaced by an online examination in which the questions were all automatically marked, a 
mixture of multiple choice and text match questions. Human intervention in the marking process 
was now minimal, with less than one hour required. 

We were satisfied that students would not be able to access the assessment prior to the 
examination, but we were initially concerned that students might be able to access unauthorised 
websites during the assessment. Invigilators were trained to check the computer taskbars for 
minimised icons and to investigate any that looked suspicious. The architecture of the clusters also 
gave cause for concern - these had been designed for teaching and private study, with collaboration 
encouraged. Invigilators also had to be wary of students looking at one another's screens. 

We gradually increased the range of question types available. Short answers and even short essays 
are supported by the Blackboard 9 assessment software. These can be marked either online within 
Blackboard or by downloading a csv file and marking in an Excel spreadsheet. Marking in a 
spreadsheet is undoubtedly quicker than marking online because students' answers to a particular 
question are arranged in a column; scrolling from one answer to the next is much quicker than 
closing one file and opening another (see Fig. 1) 
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Mrs LP, diagnosed with ankylosing spondylitis, has 
suffered gastro-intestinal problems whilst taking the 
NSAIDs. The prescriber would like to try the patient 
on the selective COX 2 inhibitor celecoxib. What 

should the initial dose be if Mrs LP has moderate 
hepatic impairment? (1 mark) 

q41/l 

The half-life of celecoxib is normally around 8 hours 
but can increase to around 13 hours in hepatic 
impairment. What range of time would you expect it 
to take till steady state plasma levels are reached? (1 
mark) 

q42/l 

Question text with student answers 
in a column 

The apparent volume of distribution is around 450 L 
which is a relatively high value. What does this tell us 
about the distribution of celecoxib? (1 mark) 

q4 

lOOmg daily in 1-2 divided doses. 

1 

Column of marks 

award©d Most of the drug is in tissues (Extravascular 

40 to 65 hours. 1 fluid) and not in the blood plasma. 


lOOmg daily in 1-2 divided doses 

1 

5 half lives so 40-65 hours 

1 

Celecoxib is highly distributed from the 
bloodstream into extracellular tissues. There is 
a higher concentration of Celecoxib in the 
extracellular tissues than in the bloodstream. 


lOOmg daily 

1 

24-39 hrs 

0 

At any given time the majority of celecoxib is 
distributed throughout the tissues 


lOOmg daily in 1-2 divided doses. She can either 
take lOOmg once per day or 50mg twice a day. 

1 

5 hours 

0 

Large amount of Celecoxib drug is distributed 
into the body tissues most of the time, small 
amount of Celecoxib drug is distributed in the 

blood. 


150mg 

0 

40-65 hours (13x5=65 and 8x5=40) 

1 

It is ditributed to most tissues in the body 


lOOmg (in 1-2 divided doses) 

1 

24 hours 

0 

Because the volume of distribution is relatively 
high, it means that the majority of the Celecoxib 

is distributed within the tissues rather than in 

the blood. 


lOOmg 

1 

40 hours to 65 hours 

1 

At any one point in time, there is a high extent 
of distribution of celecoxib around body tissues. 


lOOmg in 1-2 divided doses. 

1 

40-65 hours 

1 

The high volume of distribution means the drug 
is mainly in the tissue rather than the systemic 
blood circulation. 



Figure. 1. A part of an Excel spreadsheet used for marking. The examiner sees all answers to a 
question in a column and marks (quickly) in a column alongside the answers. The spreadsheet can 
be sorted (for example by mark awarded) for checking. The same spreadsheet (with all identifiers 
removed) can be returned to the students as a form of feedback. 

We developed a bank of chemical structures such that the answer to a question could be a chemical 
structure. 153 structures were classified according to the heteroatoms (atoms other than C, H, N, O) 
they contain, the number and size of rings and functional groups. Filters allowed students to select 
appropriate structures quickly (see Fig. 2). 



Figure 2. Chloramphenicol: the structure contains two chlorine atoms, one six-membered ring and 
an amide function. 

Both the University IT systems and Blackboard proved very reliable during the period up to 2011. 

We developed various pieces of bespoke software that allowed us, for example, to upload 
examination papers directly from Word, incorporating diagrams and chemical structures. 

During the January 2013 examination period, several examinations were hit by a failure to save 
answers, affecting about 17% of students. This was tracked to a failure in the six Blackboard servers 
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to transfer load effectively when traffic was high. The problem took several weeks to identify, at 
which point it was quickly corrected. 

The Advantages of Online Assessment: Accuracy, Speed and Feedback 

The 2013 failure resulted in a significant loss in confidence in online assessment, and a reduction in 
the number of examinations conducted online. It prompted a reconsideration of the advantages of 
online assessment, as well as the development of additional security features. 

The advantages of online assessment have been clearly outlined by McGee Wood et al (2005) and 
latterly Briggs (2015). The first and indisputable advantage is that computers are good at adding up 
- they total marks awarded quickly and accurately. Although academics like to think they are good 
at adding up, the evidence (Aojula et al 2006) is that they are not. Andrews (2005, personal 
communication) demonstrated that the error rate is typically 5% in a moderately complex 
assessment. It is therefore imperative that an assessment that is marked manually be totalled at 
least twice. 

A second advantage of online text-based examinations is that students' handwriting is often difficult 
to decipher. It is much faster to mark typescript, which is inherently legible. McCann (2010) has 
demonstrated that these obvious advantages may not be sufficient to persuade academics to invest 
the initial effort in mastering the logistics of e-assessment. 

The real insight made by McGee Wood et al is that computers are good at finding script. Figure 1 
shows three questions, answered by many students, with the answers arranged one above the other 
in a spreadsheet. When answers are arranged in this way there is no time spent rifling through 
answer books trying to find the answer to a particular question. Academics who mark on a 
spreadsheet normally estimate time savings of a factor of between 2 and 10, depending upon the 
length of the typical answer (the time savings are less with longer answers where more time is spent 
marking and less is spent finding the answer). That most of the time saving arises from finding the 
script quickly is evidenced by a direct comparison. In Blackboard, it is possible to download the 
students' answers as a csv file, but it is also possible to mark directly onto an online document 
resembling an examination script. The former is much faster; we estimate a factor of 2, based on 
three academics marking similar assessments using each method. 

The same spreadsheet format permits improvements in accuracy of marking. After marking a 
question, it is easy to sort the spreadsheet by mark and to confirm that answers achieving the same 
mark are comparable. This is very difficult to achieve in a paper-based examination. 

All Student Feedback 

Marking on a spreadsheet leads to such confidence in consistency of marking that complete 
transparency is possible. Beginning in 2008, after securing the permission of the students, we 
stripped all identifiers from the marked examination spreadsheet and made it available to all the 
students who had sat the examination; essentially students saw figure 1, but extended for all 
questions. Thus students could not only see where they gained or lost marks, but could see a range 
of very good answers for each question. In a class of over 120 second year students, all but one 
reported this feedback to be useful or very useful. 

This type of feedback gives students valuable insights into the marking process which model answers 
cannot give. Very good answers, especially longer answers, look very different from one another 
and are, at least in theory, more valuable than model answers. 
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In many years of using All Student Feedback large classes the present authors have received no 
complaints about their marking. One third year student summed up the student response verbally "I 
thought you were a bit harsh in question 2, but that I got lucky in question 3." On one occasion, a 
student noticed that a question had not been marked. She alerted the examiners immediately, and 
the mistake was rectified. 

More Feedback - The Smallvoice App 

It is quite common for a student to approach an academic following an assessment and to say 
"Where did I go wrong?" Armed with an online assessment, the academic may scan the row 
corresponding to the particular student's assessment and offer some analysis. 

Typical comments include: 

• You haven't answered all the questions. 

• You haven't revised a particular topic 

• Your English is poor 

Very often, even usually, such analysis could just as well be given by a well-programmed computer, 
as in the examples above. We therefore developed Smallvoice. 

Smallvoice provides rapid, automated, completely personal feedback on performance to students in 
large classes. It analyses many different aspects of a student's performance and synthesises 
accurate, confidential advice. Smallvoice is a freestanding tool, able to integrate with commonly- 
used data systems around the world. 

Smallvoice analyses an examination paper (either a computer-based examination paper or a 
transcript of marks from a paper-based examination) in the same way as an instructor might analyse 
a paper following an examination. It reports on a student's performance in different topics (for 
example different diseases), different question types (e.g. factual recall, multiple choice, critical 
argument). In addition it analyses performance in ways that are much easier for a computer 
program than for an instructor. It incorporates a powerful algorithm for discrimination values, so is 
able to comment on whether a student fared batter in the easier or more difficult questions relative 
to the rest of the class. It correlates performance in summative assessment with attendance and 
with performance in past formative and summative assessments. Students receive a detailed email 
showing where they are in the class, trends in their performance, and incorporating links to 
sophisticated statistics about individual questions. The feedback is made up of text inputted by the 
instructors and is therefore personal in tone; it is at its most powerful when used to congratulate 
good students, to encourage average and weaker students and to give advice about preparation for 
future learning. 

Smallvoice hugely increases student satisfaction. We have received numerous emails of 
appreciation from students. In the pilot course unit about 10% of the cohort sent unsolicited 
messages of thanks and the feedback score was 4.69 /5 in the University's course evaluation 
questionnaire. We have also seen average marks rise 10-20% between successive examinations 
following feedback. Smallvoice lends itself to feedback that advises students about improving 
performance, which (like the personal tone) is a hallmark of current perceptions of good feedback 
(Price et al 2010; Boud & Malloy 2013). 
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Sample Output 


Dear [Forename], 

Here is some feedback following your semester one exams. Your weighted mean 
for semester 1 was 65.4% and the mean for the cohort was 67.2%. Your position in 
the group was therefore 98= This was a good solid 2.1 performance in semester 
one. Well done! Your semester 1 mark is significantly higher than your year 3 mark 
so very well done! 

The second year contributes 10% to your final degree classification and the third 
year contributes 20%. In the fourth year so far you have completed 50 credits out of 
120, that's another 29.2% of your degree. 40.8%> remains. 


The opening text is common to all students 
except that the computer inserts individual 
marks 

General comments on performance vary 
according to the overall mark and there is 
also a comment comparing this assessment 
to previous performance. By the end of the 
first paragraph, the feedback is significantly 
personalised. 


Table 1 shows you the average mark you need in semester 2 to get each class of 
degree. 


Table 1 

Averages required for the rest of your degree 



to get a first you 
need 

to get a 2.1 you 
need 

to get a 2.2 you 
need 

to get a third you 
need 

80.1% 

55.7% 


31.2% 

6.7% 


Students often express gratitude for 
this item of feedback. For this final 
year student, it is clear that a 2.1 can be 
achieved by doing more of the same. 


Do remember though, that the average is not quite the whole story. You have to 
pass all the modules! 


Table 2 shows a summary of your module marks compared with the class averages. 
Your mark in Law was especially commendable. 


Table 2: 

Summary of your module marks 

Module 

Law 

Dispen 

sing 

Social 

Pharmacy 

Micro¬ 

biolog 

y 

Neuro- 

pharmacol 

Your Mark 

80.9 

79.2 

63 

68 

58 

Your Position 

47= 

112= 

128= 

64= 

20 

Number in class 

170 

170 

170 

152 

29 

Class mean mark 

75.2 

81.5 

69.9 

64.8 

61.0 


Some modules in Pharmacy have pass marks of 
60% and high class means. It is helpful to 
students to have the class mean and their own 
position in order to assess progress. 


You're progressing very well. Good luck with the rest of the semester. 

Best wishes 
Jill and Steve 

Figure 3. Part of a Smallvoice email sent to a final year student following semester 1 examinations. 

Figure 3 shows an example of part of the feedback used to support the end of semester one 
examinations for fourth year students. Smallvoice can also be used to give very fine-grained 
feedback (for example a discussion of individual questions in a single course assessment. 

The Future of Online Assessment 

Given the advantages of online assessment to both academic staff and students, progress in 
delivering secure, flexible, easily-managed and (above all) reliable assessment has been 
disappointing. The delivery of online assessment requires an enormous amount of care, and the 
support of local in-house IT experts. 

Examination infrastructure 

To ensure security during examinations, the University of Manchester has developed computer 
clusters specifically for examinations. Computers are widely spaced and screens cannot easily be 
seen by a student's neighbours. A specific "examination desktop" is loaded onto the cluster 
machines prior to the examination period and websites outside Blackboard cannot be viewed. 


This feature has led to the development of a novel online open-book examination format, in which 
students are able to access specific materials contained within the same Blackboard folder as the 
examination. 
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The conduct of online examinations is now coordinated by a specific member of the Examinations 
Office. Protocols for paper-based examinations have evolved over many decades to accommodate 
several examinations taking place in the same very large room. Online examinations present a new 
paradigm. A single examination may be housed in several different remote rooms. Ensuring 
consistency between rooms is a significant challenge, requiring efficient communication between 
several rooms (carried out via online messaging). 

Load testing 

The 2013 failure prompted us to develop load testing protocols to be carried out ahead of every 
examination period. The intention during load testing was to provide evidence that the current 
deployment of our virtual learning environment was fit for purpose and that there was a relatively 
low risk of encountering any load related issues during the setup or running of our online exams. 
Several clusters of desktop machines were used in the testing with a combined provision of 
approximately 400 machines. A version of the Mozilla Firefox browser was modified so that it could 
simulate individual student activity during setup and running of an online examination, this browser 
was started on each machine so that the behaviour of 400 virtual students could be arranged and 
synchronised during the period allocated for testing. 

Two tests are conducted on the Blackboard infrastructure. The first introduces gradual load 
(achieved by conducting a real exam on each PC) onto Blackboard up to the maximum PCs available 
across all the clusters used. When this capacity is reached, the exam is allowed to continue for 
approximately 15 minutes to test for sustained load on Blackboard. The second test starts all the 
exams together to simulate peak load of the system. 

The virtual learning environment configuration at the University of Manchester is currently 
composed of 10 application servers. The advice from our hosting partners has been that this 
configuration is over specified for our actual use. The intention of load testing was to prove that the 
servers would cope without failure with the load being generated during examinations and equally 
important that they would comfortably do so. Whilst it is often difficult to correlate load and system 
utilisation, one measure that can be used is the number of queued processes within the processor of 
an application server. The larger the number the more likely there is to be service degradation, or 
service loss (either partial or full). If a sustained load of "20" was observed an investigation was 
triggered. If an application server reached a value of "50" an automated procedure would take it out 
of the processing pool so that no additional load would be transferred to it. During both tests 
undertaken during our recent load testing, the maximum number of queued processes observed was 
"6", with a typical value being between "1 and 3". 

Load testing is, we believe, a necessary prelude to online examinations. 

Downloads 

Downloading examinations in csv format also requires specialist tools. Blackboard, for example, 
enables html which is not rendered directly. In general, this is removed manually. 

More inconvenient still is that students occasionally use a character, such as a hyphen, as a bullet 
point. Excel recognises this as a delimiter, and a student's answer may be truncated as a result of its 
use. The solution is to brace each answer inside | characters, which can be achieved in a number of 
ways, by opening the csv file initially in a program other than Excel. 

Drawing tools 

In Pharmacy and related subjects, online assessment will only come of age when drawing diagrams 
and chemical structures within the assessment are enabled. 
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Discussion 

Security of online examinations remains a concern. Nevertheless, this is a concern that Universities 
have so far proved able to meet. An online assessment must of necessity be mounted on a server 
ahead of the examination period; this means that it can, in principle, be hacked. There is a body of 
literature devoted to the security of online assessments (Apampa et al. 2009). Of course, it is, in 
principle possible for students to break into a University safe containing paper-based examinations, 
but this is a familiar risk, one that we are content to live with. Even though university online security 
systems are generally of a very high standard (universities maintain personal and often medical 
data), it is much easier for academic staff to imagine their students as computer hackers than as safe 
breakers. 

As of 2013, online examinations did not provide the flexibility of question type that we ultimately 
require, but neither were they restricted to multiple choice questions. 

Reliability remains the key issue that prevents many academics embracing online assessment 
(Warburton, 2009). Computers are inherently unreliable. A typical 50-seat computer cluster in a 
University might be expected to have two or three computers out of commission for various reasons 
at any one time. This is a level of unreliability we would find unacceptable in our cars or washing 
machines. The fear is always of a computer failure mid-assessment, so that student work is lost. 
Some of the published literature incorporates elaborate schemes for backing up student work on 
paper (Aojula et al. 2006). 

Further, an online assessment produces dependence. An academic conducting a paper-based 
assessment maintains the impression or illusion of control. Academics conducting online 
examinations become dependent upon IT staff, whose expertise is usually quite alien. Attempts to 
guide academics through the process serve to reinforce dependence (Willis et al. 2009). 

Online assessment has the potential to be enormously powerful, saving time, giving improved 
accuracy and transparency and greatly facilitating feedback. Holmes (2015) and others have also 
pointed to the frequent use of simpler e-assessments as a means of improving student engagement. 
Feedback deserves special consideration. Because an on-line assessment can be modified and 
reproduced, or readily loaded into another computer program, such as Smallvoice, it is possible to 
give students high quality feedback with reasonable throughput. When the Smallvoice prototype 
was introduced in 2011, we saw an increase of 19 percentage points (from 46% to 65% satisfaction) 
in "Assessment and Feedback" added to the School's NSS score in a single year. Progress since then 
has, however, been slow. Sector-wide, students find feedback less than completely satisfactory, and 
the current challenge is to embed feedback into the curriculum, thereby managing students' 
expectations. Our current effort is focussed on understanding how students respond to feedback, 
and developing tools to enable them to comment on their feedback, thereby opening ongoing 
dialogue. 
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